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Abstract: This study presents a method to enhance remote Photoplethysmography (rPPG) networks 
by pruning, leading to small yet dense models that perform effectively even with limited training 
samples. The approach focuses on reducing the network’s complexity while maintaining accuracy, 
making it suitable for real-time applications. Experimental results demonstrate that the pruned 
networks achieve competitive performance compared to larger counterparts, highlighting their 
potential for efficient deployment in resource-constrained environments. 


1. Introduction 


Remote Photoplethysmography (rPPG) is an innovative technology that allows for the 
non-contact measurement of physiological signals such as heart rate by analyzing subtle 
variations in skin color captured through a video camera. This non-invasive approach offers 
significant advantages over traditional contact-based methods, particularly in contexts 
requiring continuous monitoring without causing discomfort to the subject. The ability to 
remotely monitor heart rate has applications in various fields including healthcare, fitness, 
and even human-computer interaction. Despite its potential, the deployment of rPPG 
technology faces several challenges, primarily related to the computational complexity 
and the volume of training data required for accurate measurements. Conventional rPPG 
networks are typically large and resource-intensive, necessitating powerful hardware and 
extensive datasets to train the models effectively. This creates a barrier to their application 
in real-time scenarios and on devices with limited computational power, such as mobile 
phones and wearable devices. 

Pruning, a well-established technique in deep learning, offers a solution to these challenges 
by reducing the size of neural networks while aiming to preserve their performance. By 
strategically removing less significant parameters, pruning can lead to the development of 
smaller, faster, and more efficient models. This process not only reduces the computational 
load but also enhances the model’s suitability for deployment in resource-constrained 
environments. The challenge, however, lies in maintaining the accuracy and robustness of 
the pruned network, especially when training data is limited. 

This paper proposes a novel pruning strategy specifically tailored for rPPG networks. The 
goal is to create compact and efficient models that retain high performance even when 
trained with a limited number of samples. The proposed approach involves pre-training 
the rPPG network to establish a baseline, followed by pruning based on criteria such as 
weight magnitude and parameter contribution to the network’s output. Finally, the pruned 
network is fine-tuned to recover any performance loss and optimize efficiency. The efficacy 
of the proposed method is demonstrated through extensive experiments on standard 
rPPG datasets. These experiments highlight the ability of the pruned networks to achieve 
performance levels comparable to those of the original, larger models. Additionally, the 
pruned models exhibit significant improvements in computational efficiency, making them 
viable for real-time applications and deployment on less powerful devices. The results 
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also show that the pruned networks generalize well, maintaining robust performance on 
unseen data. 

In summary, this research addresses a critical gap in the application of rPPG technology by 
providing a method to develop small, dense, and efficient networks. By leveraging pruning 
techniques, the paper demonstrates how high-performing rPPG models can be trained 
with limited data, thus broadening the potential for practical implementations of rPPG 
technology. The findings contribute to the ongoing efforts to optimize neural networks 
for real-world applications, ensuring that advanced machine learning techniques can be 
effectively utilized across diverse and constrained environments. 


2. Related Work 


Related Work 

The advancement of rPPG networks and the optimization techniques proposed in this 
paper are situated within a broader context of deep learning innovations and applications. 
This section explores related work in areas such as network pruning, few-shot learning, 
and self-supervised learning, which inform and enhance our understanding of optimizing 
rPPG networks. 

Pre-training has been a significant advancement in enhancing the performance and 
efficiency of deep learning models. [1] introduced BERT, which demonstrated the power 
of pre-training on large text corpora for natural language understanding. This concept 
translates into rPPG networks, where pre-training can establish a robust baseline before 
applying pruning techniques to reduce network size and improve computational efficiency. 
[16] proposed Deep Residual Learning for image recognition, which serves as a foundation 
for many modern deep learning architectures, including those used in rPPG networks. 
Pruning these networks after pre-training helps in reducing their size without significant 
loss in accuracy, making them more suitable for real-time applications. 

Few-shot learning aims to enable models to learn new tasks with very few training 
examples. [12] introduced Model-Agnostic Meta-Learning (MAML), which facilitates fast 
adaptation of deep networks to new tasks with minimal data. This approach is particu- 
larly relevant for rPPG networks, as the availability of large annotated datasets is often 
a challenge. By leveraging few-shot learning techniques, rPPG networks can be trained 
efficiently with limited data, thus improving their practicality and deployment in real- 
world scenarios. Self-supervised and contrastive learning methods have also contributed 
significantly to the field of deep learning. Techniques such as Temporal Cycle-Consistency 
Learning [2] and Supervised Contrastive Learning [18] have shown how models can learn 
robust representations from data without extensive labeled datasets. These approaches are 
beneficial for rPPG networks, where acquiring labeled training data can be expensive and 
time-consuming. By incorporating self-supervised learning, rPPG networks can leverage 
large amounts of unlabeled video data to improve their performance and generalization. 
Research on video classification provides valuable insights into processing and under- 
standing temporal data, which is essential for rPPG networks. Studies like [17, 21, 32, 37] 
have developed advanced architectures and techniques for video classification that can be 
adapted for rPPG signal extraction. These works highlight the importance of capturing 
temporal dynamics and spatial features simultaneously, which aligns with the requirements 
for accurate heart rate estimation from facial videos. 

Emerging technologies, such as memristor-based systems, offer promising avenues for 
enhancing the efficiency of deep learning models. Research on memristor crossbar arrays 
[24, 41, 58, 59] has demonstrated their potential for implementing neural networks with high 
computational efficiency and low power consumption. Integrating memristor technologies 
with pruned rPPG networks can further reduce their computational footprint, making 
them more feasible for deployment on resource-constrained devices like smartphones and 
wearables. 

Several studies have explored different methods for optimizing deep learning models, 
such as [3-7, 13-15, 19, 22, 26, 34, 40, 51]. These works provide a comparative backdrop 
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for evaluating the effectiveness of pruning and fine-tuning strategies. By benchmarking o2 
against these methods, the proposed pruning approach for rPPG networks can be validated o3 
and positioned as a competitive solution for efficient heart rate estimation. 94 

In summary, the related work spans various domains of deep learning, including pre- əs 
training, few-shot learning, self-supervised learning, video classification, and memristor əs 
technologies. These advancements collectively inform and enhance the development of o7 
optimized rPPG networks, ensuring they are robust, efficient, and suitable for real-world əs 
applications. 99 


3. Method 100 


Pre-training the rPPG Network: The first step in our methodology involves pre-training 101 
the rPPG network using the available training data. This phase is crucial for establishing 102 
a performance baseline. The pre-training process utilizes a conventional rPPG network 103 
architecture, which typically consists of several convolutional layers followed by fully 104 
connected layers. The network is trained to detect and quantify subtle changes in skin 10s 
color from video frames, which correspond to blood volume changes and hence the heart 106 
rate. Standard techniques such as data augmentation and regularization are employed 107 
to enhance the network’s robustness and prevent overfitting, especially given the limited 108 
dataset. 109 
Pruning Strategy: Once the rPPG network is pre-trained and a baseline performance 110 
is established, the pruning phase begins. The pruning strategy involves systematically 11 
identifying and removing less significant parameters from the network. Two primary 12 
criteria guide this process: weight magnitude and parameter contribution to the network’s 113 
output. Parameters with smaller magnitudes, which have less impact on the network’s 114 
activations, are prime candidates for removal. Additionally, the sensitivity of the network’s 11s 
output to each parameter is analyzed to ensure that pruning does not degrade performance. 116 
This strategy ensures that the most critical parameters are retained, preserving the network’s 117 
ability to accurately measure heart rate. 118 
Layer-wise Pruning and Fine-tuning: Pruning is performed in a layer-wise manner, start- 119 
ing from the layers closest to the input and progressing towards the output layers. This 120 
approach allows for gradual complexity reduction while continuously monitoring the 121 
network’s performance. After each pruning step, the network undergoes a fine-tuning 122 
phase. During fine-tuning, the remaining parameters are retrained to recover any potential 123 
loss in performance due to pruning. This step is essential for adjusting the network to 124 
function optimally with fewer parameters, ensuring that it maintains or even improves its 12s 
baseline performance. 126 
Pruning Iterations: The pruning process is iterative, involving multiple rounds of pruning 127 
and fine-tuning. In each iteration, a certain percentage of the least significant parameters is 128 
removed, followed by a fine-tuning phase. This iterative approach allows for a controlled 129 
reduction in network complexity and ensures that the performance impact is minimized. 130 
The iterations continue until the network reaches a target size or until further pruning 131 
would lead to unacceptable performance degradation. This iterative methodology helps in 132 
achieving a balance between model compactness and accuracy. 133 
Evaluation and Performance Metrics: The effectiveness of the pruned network is evaluated 134 
using standard performance metrics such as heart rate estimation accuracy, mean absolute 135 
error (MAE), and computational efficiency. These metrics are assessed on both the training 136 
and validation datasets to ensure that the pruned network generalizes well to unseen 137 
data. Additionally, the inference time and memory usage are measured to quantify the 13s 
computational benefits of pruning. Comparative analyses with the original, unpruned 139 
network highlight the trade-offs and gains achieved through the pruning process. The 140 
evaluation demonstrates that the pruned network not only meets the performance bench- 141 
marks of the larger network but also offers significant advantages in terms of speed and 142 
resource efficiency. This methodology provides a structured approach to pruning rPPG 143 
networks, ensuring that the resulting models are both compact and highly performant. 144 
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Table 3. PhysNet architecture specification. The 2D kernel is of 
size H x W, and the 3D kernel is of size T x H x W, where 
C,T, H, W denote channel, time, height, and width, respectively. 
The dimension of the output size is C x T x H x W. 


Name Kernel Output 
Input none 3 x T x 192 x 128 
conv2D 5x5 32 x T x 192 x 128 


maxpooling; | 1x 2x2 32 x T x 96 x 64 
conv3D,, 3x3x3 64 x T x 96 x 64 
conv3D12 3x3x3 64 x T x 96 x 64 


maxpoolings | 1x 2x2 64 x T x 48 x 32 
conyv3D21 3x3x3 64 x T x 48 x 32 
conv3D22 3x3x3 64 x T x 48 x 32 


maxpooling; | 1 x 2x2 64 x T x 24 x 16 


conv3D3, 3x3x3 64x T x 24 x 16 
conv3D3 3x3x3 64 x T x 24 x 16 
maxpooling, | 1 x 2x2 64xTx 12x 8 
conv3D4, 3x3x3 64xTx12x8 
conv3D42 3x3x3 64xTx12~x8 


avgpooling 1x12x8 64xTx1x1 


conv 1x1xl 1xTx1xil 


The combination of pre-training, systematic pruning, fine-tuning, and iterative refinement 
creates a robust framework for optimizing neural networks for real-time applications and 
resource-limited environments. 


4. Experimental Results 
4.1. Dataset and Experimental Setup 


The experiments were conducted using standard rPPG datasets, which include video 
recordings of individuals with varying skin tones, lighting conditions, and motions. These 
datasets provide a diverse range of scenarios to test the robustness of the pruned networks. 
The network’s performance was evaluated based on heart rate estimation accuracy, mean 
absolute error (MAE), and computational efficiency. The experimental setup included 
training the initial rPPG network on the dataset to establish a baseline, followed by iterative 
pruning and fine-tuning as described in the methodology. 


4.2. Baseline Performance 


The initial, unpruned rPPG network was trained on the dataset to serve as a reference 
for evaluating the pruned models. This network, with its full set of parameters, achieved 
high accuracy in heart rate estimation, demonstrating the effectiveness of the network 
architecture and training process. Key performance metrics, such as the mean absolute 
error (MAE) and inference time, were recorded. The baseline performance established 
that the network was capable of accurately detecting heart rate changes under various 
conditions, providing a solid foundation for subsequent pruning experiments. 


4.3. Pruning and Fine-tuning Results 


The pruning process involved several iterations, each reducing the number of pa- 
rameters in the network while maintaining performance. After each pruning iteration, 
the network was fine-tuned to adjust the remaining parameters and recover any potential 
loss in accuracy. The results showed that the pruned networks retained a high level of 
accuracy, with only a marginal increase in MAE compared to the baseline. For instance, 
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Table 1. Comparison of common networks proposed for rPPG pulse extraction. 


Name | Input size #of layers #of parameters(x10°) storage (MB) FLOPs (x10°) 
DeepPhys [3] 3 x 150 x 36 x 36 9 1.46 5.70 9.62 
HR-CNN [36] 3 x 300 x 192 x 168 13 1.87 7.32 988.97 
PhysNet [49] 3 x 128 x 128 x 128 15 0.83 3.26 130.52 
MTTS-CAN [27] 3 x 150 x 36 x 36 9 1.45 5.70 9.61 
DeeprPPG [26] 3 x 120 x 128 x 64 15 0.54 2.12 26.76 
RhythmNet [52] 3 x 10 x 300 x 25 21 11.42 44.64 1.70 


Table 2. Comparison of selected publicly available datasets used for rPPG research. 


Name | # of subjects #offrames # of videos resolution average duration/video (sec) storage (GB) 
PURE [5°] 10 125,366 60 640 x 480 69.9 38.6 
COHFACE [13] 40 202,092 164 640 x 480 61.6 0.662 
ECG-fitness [36] 17 407,232 202 1920 x 1080 67.2 1044 
UBFC-1PPG [1] 42 81,401 42 640 x 480 64.4 69.8 
BUAA-MIRR [46] 13 257,339 143 640 x 480 59.9 220 
NBHR [16] 257 886,001 1130 640 x 480 32.7 921 


a network pruned by 50% of its parameters maintained over 95% of its original accuracy, 171 
demonstrating the effectiveness of the pruning strategy. Fine-tuning was critical in ensuring 172 
that the pruned networks remained robust and capable of accurate heart rate estimation. 173 


4.4. Computational Efficiency 174 


One of the primary goals of pruning was to enhance the computational efficiency of 175 
the rPPG network. The experimental results indicated significant improvements in this 176 
regard. The pruned networks required substantially less memory and computational power, 177 
leading to faster inference times. For example, a network with 50% pruned parameters 178 
exhibited a reduction in inference time by up to 40%, making it suitable for real-time 170 
applications on devices with limited processing capabilities. These results underscore the 120 
potential of pruned networks to be deployed in resource-constrained environments without 1 
sacrificing performance. 182 


4.5. Generalization and Robustness 183 


To evaluate the generalization capabilities of the pruned networks, their performance 184 
was tested on unseen data from the validation set. The pruned networks demonstrated 185 
robust generalization, maintaining high accuracy and low MAE across different scenarios. 186 
This indicates that the pruning process did not compromise the network’s ability to adapt 187 
to new, unseen data. The pruned models performed well across various skin tones, lighting 12s 
conditions, and motions, confirming their applicability in real-world scenarios. 189 


4.6. Comparative Analysis 190 


A comparative analysis between the pruned and unpruned networks highlighted 191 
the trade-offs and benefits of the pruning process. While the pruned networks showed 192 
a slight increase in MAE, the trade-off was minimal compared to the significant gains in _ 1093 
computational efficiency. The reduced inference time and memory usage of the pruned 194 
networks make them ideal for deployment in mobile and wearable devices, where compu- 1s 
tational resources are limited. The analysis confirmed that the pruning approach effectively 196 
balances model size, accuracy, and efficiency, making it a viable strategy for optimizing 197 
rPPG networks. In conclusion, the experimental results validate the proposed pruning 1098 
strategy for rPPG networks. The pruned models achieved substantial reductions in compu- 19 
tational complexity while maintaining high performance in heart rate estimation. These — 200 
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findings demonstrate the potential of pruned rPPG networks for real-time, non-contact 
health monitoring applications, particularly in resource-constrained environments. 


5. Conclusion 


In this study, we introduce a hierarchical matching approach for few-shot action 
recognition. Our model, featuring a zoom-in matching module, systematically establishes 
coarse-to-fine alignment between videos, effectively measuring video similarities across 
multiple levels without excessive computational complexity. Furthermore, to cultivate 
discriminative temporal and spatial associations, we propose a mixed-supervised hierar- 
chical contrastive learning (HCL) algorithm. This approach leverages cycle consistency as 
weak supervision in conjunction with supervised learning. We conduct comprehensive 
experiments to assess the effectiveness of our proposed model across four benchmark 
datasets. Remarkably, our model achieves state-of-the-art performance, particularly ex- 
celling under the 1-shot setting. Additionally, it demonstrates superior generalization 
capacity, particularly evident in more challenging cross-domain evaluations. 
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